A domain-knowledge modeling of hospital-acquired infection risk in Healthcare personnel from retrospective observational data: A case study for COVID-19

Introduction Hospital-acquired infections of communicable viral diseases (CVDs) have been posing a tremendous challenge to healthcare workers globally. Healthcare personnel (HCP) is facing a consistent risk of viral infections, and subsequently higher rates of morbidity and mortality. Materials and methods We proposed a domain-knowledge-driven infection risk model to quantify the individual HCP and the population-level risks. For individual-level risk estimation, a time-variant infection risk model is proposed to capture the transmission dynamics of CVDs. At the population-level, the infection risk is estimated using a Bayesian network model constructed from three feature sets, including individual-level factors, engineering control factors, and administrative control factors. For model validation, we investigated the case study of the Coronavirus disease, in which the individual-level and population-level infection risk models were applied. The data were collected from various sources such as COVID-19 transmission databases, health surveys/questionaries from medical centers, U.S. Department of Labor databases, and cross-sectional studies. Results Regarding the individual-level risk model, the variance-based sensitivity analysis indicated that the uncertainty in the estimated risk was attributed to two variables: the number of close contacts and the viral transmission probability. Next, the disease transmission probability was computed using a multivariate logistic regression applied for a cross-sectional HCP data in the UK, with the 10-fold cross-validation accuracy of 78.23%. Combined with the previous result, we further validated the individual infection risk model by considering six occupations in the U.S. Department of Labor O*Net database. The occupation-specific risk evaluation suggested that the registered nurses, medical assistants, and respiratory therapists were the highest-risk occupations. For the population-level risk model validation, the infection risk in Texas and California was estimated, in which the infection risk in Texas was lower than that in California. This can be explained by California’s higher patient load for each HCP per day and lower personal protective equipment (PPE) sufficiency level. Conclusion The accurate estimation of infection risk at both individual level and population levels using our domain-knowledge-driven infection risk model will significantly enhance the PPE allocation, safety plans for HCP, and hospital staffing strategies.


Suggested revisions
-The abstract lacks a definition of "PPE'' -The abstract section results are populated with numbers without any explanation or context. It is hard to tell if the results are good or bad from the numbers only. Please refine the abstract to reflect the main takeaway points for a broader audience.
-Introduction: " there has been an increasing hospital outbreaks" → there has been an increasing number of hospital outbreaks.
-Introduction: " Quantitative models have been used as an alternative to mathematical models." I have the impression that Quantitative and mathematical models are synonyms. Perhaps the authors could use the alternative form: Other quantitative approaches have been used… -Introduction: " Section 2 discusses about the model formulation and model validation" -Introduction: I missed a review of modeling studies for nosocomial HCP infections at both individual and population levels. Since the paper is devoted to those two different problems, that could be reflected in the introduction. Instead, the authors briefly describe some previous work without that explicit distinction.
-Introduction: To overcome the above research gaps, this paper proposes a probabilistic domain-knowledge model The term ``domain-knowledge'' appears throughout the manuscript. I wonder if there is a precise definition for the term in the context of this study (or references) that could be included in the introduction.

-Materials and Methods: "…(1) an individual-level infection risk model that quantifies the risk of infection of an HCP… (2) a population-level infection risk indicator model that estimates the infection risk under working conditions at a medical facility"
Odd sentence construction. I suggest: "(1) an individual-level infection risk model for HCP and (2) a population-level model that estimates the infection risk under working conditions… " -Materials and methods, Section 2.1: In Eq. 4, the first equation is substituted by a summation indexed by m, and new variables h_m appear. Perhaps this is a standard calculation and I am missing some elementary steps, but it was unclear how the breakdown of h into h_m is done. Please clarify.
-Materials and methods, Section 2.1: " where is the length of the close contact with person ( )" It seems it should be m instead of r -Materials and methods, Section 2.1: The order in which the individual model is presented seems counterintuitive. After reading it carefully, I understood that the first step would be to calculate the probabilities P ( ), ( )→ j from the logistic regression with the covariate data, and then the risk indicator could be estimated. The authors leave the logistic regression as the last step, which is confusing. Please consider addressing this point to improve the text clarity.
-Materials and Methods, Section 2.2: " We denote (•) as the abbreviated notation for the function of , (•) and in Eq. (9) " It is unclear the role of Eq 9 in this sentence and the authors should avoid mentioning an equation before it appears in the text. Please clarify.
-Materials and Methods, Section 2.2: "the population risk (•) is estimated using a Bayesian network…"  2) is / max{ }, then it is unclear why Eq 6 was mentioned in the Materials and Methods section. Please clarify or omit any equation that is not used in the results section.
-Discussion: It is good practice to introduce a first paragraph reviewing the main problem, what methods were developed, and the main takeaways of a study. Please consider adding such a paragraph.